AITopics | augmented observation

Collaborating Authors

augmented observation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

69eba34671b3ef1ef38ee85caae6b2a1-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 18:32:54 GMT

acquisition function, augmented observation, hyperparameter, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

2b38c2df6a49b97f706ec9148ce48d86-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 00:34:04 GMT

In this section, we further clarify why a naive application of data augmentation with certain RL algorithms is theoretically unsound. The correct estimate of the policy gradient objective used in PPO is the one in equation(1) (or equivalently, equation(8)) which does not use the augmented observations at all since we are estimating advantages for the actual observations,A(s,a). The probability distribution used to sample advantages isπold(a|s)(rather thanπold(a|f(s))since we can only interact with the environment via the true observations and not the augmented ones (because the reward and transition functions are not defined for augmented observations). Hence, the correct importance sampling estimate usesπ(a|s)/πold(a|s). Usingπ(a|f(s))/πold(a|f(s))instead would be incorrect for the reasons mentioned above. In contrast, DrAC does not change the policy gradient objective at all which remains the one in equation(1)andinsteadusestheaugmented observationsintheadditional regularizationlosses,as showninequations (3), (4),and (5). Note that this cycle-consistency implies that two trajectories are accurately aligned in the hidden space.

artificial intelligence, equation, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

69eba34671b3ef1ef38ee85caae6b2a1-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 03:52:27 GMT

artificial intelligence, hyperparameter, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Bayesian Optimization for Iterative Learning

Nguyen, Vu, Schulze, Sebastian, Osborne, Michael A

arXiv.org Machine LearningSep-20-2019

The success of deep (reinforcement) learning systems crucially depends on the correct choice of hyperparameters which are notoriously sensitive and expensive to evaluate. Training these systems typically requires running iterative processes over multiple epochs or episodes. Traditional approaches only consider final performances of a hyperparameter although intermediate information from the learning curve is readily available. In this paper, we present a Bayesian optimization approach which exploits the iterative structure of learning algorithms for efficient hyperparameter tuning. First, we transform each training curve into a numeric score. Second, we selectively augment the data using the auxiliary information from the curve. This augmentation step enables modeling efficiency while preventing the ill-conditioned issue of Gaussian process covariance matrix happened when adding the whole curve. We demonstrate the efficiency of our algorithm by tuning hyperparameters for the training of deep reinforcement learning agents and convolutional neural networks. Our algorithm outperforms all existing baselines in identifying optimal hyperparameters in minimal time.

algorithm, hyperparameter, optimization, (15 more...)

arXiv.org Machine Learning

1909.09593

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback